# A tibble: 1 × 4
mean sd se count
<dbl> <dbl> <dbl> <int>
1 51.7 12.0 0.834 208
Our last graph
we are going to use some sculpin data that is real!
# A tibble: 1 × 4
mean sd se count
<dbl> <dbl> <dbl> <int>
1 51.7 12.0 0.834 208
What is a frequency distribution?
# A tibble: 28 × 2
length_bin n
<fct> <int>
1 [11,13] 4
2 (19,21] 1
3 (23,25] 1
4 (27,29] 2
5 (29,31] 2
6 (31,33] 1
7 (33,35] 4
8 (35,37] 3
9 (37,39] 7
10 (39,41] 9
# ℹ 18 more rows
What happens as sample size changes…
Low sample number - 15
High sample number - 70
Can we make assumption about distribution of random variable weight in population?
Probability distribution:
For continuous random var: probability density function (PDF)
PDF: mathematical expression of probabilities associated with getting certain values of random variable
Area under curve = 1
i.e., probability of lenght between 10 and 80 = 1
Now we could look at a lot of different ranges of lengths - probability of the lenght larger than the mean - probability of the lenght larger than 70 mm - probabilioty of the lenght between two numbers
asdfasdf
Normal (Gaussian): symmetrical, bell-shaped
\[f(y) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y - \mu)^2}{2\sigma^2}}\]
Lognormal: right-skewed distribution
Logarithm of random variable is normally distributed
Common in biology.
Why would this occur or be common in biology?
different, well-defined distributions
allows estimation of probabilities associated with results
Examples:
asdfadsf
asdfasdf
asdfadsf
asdfasdf
asdfadsf
asdfasdf
Random sampling crucial for
sample -> population
inference statistics -> parameters
Two main kinds of summary statistics: - center and spread
Center: - Mean (µ, ȳ): sum of sampled values divided by n - Mode: the most common number in dataset - Median: middle measurement of data; = mean for normal distributions
d
E.g., fish lengths = 20, 30, 35, 24, 36 g
# A tibble: 1 × 1
mean
<dbl>
1 29
Spread
(20 -29)^2+ (30 -29)^2 + (35 -29)^2 + (24 -29)^2 + (36 -29)^2 = 57,104
192 / (5-1) = 48 mm^2 Problem: weird units!
# A tibble: 1 × 2
mean variance
<dbl> <dbl>
1 29 48
In same units as observations
In example: √48 = 6.9 mm
Problem: - don’t know the values of parameters
Goal: - estimate parameters from empirical data (samples)
3 general methods of parameter estimation: - Maximum Likelihood Estimation (MLE) - Ordinary Least Squares (OLS) - Resampling techniques
MLE general method to estimate parameters in a way that maximizes the likelihood of the observed data given the parameter values.
aims to find the parameter values that make the observed data most probable under the assumed statistical model.
OLS specific method to estimate parameters of a linear regression model.
minimizes the sum of the squared differences between observed and predicted values